Systolic block Householder transformation for RLS algorithm with two-level pipelined implementation

نویسندگان

  • K. J. Ray Liu
  • Shih-Fu Hsieh
  • Kung Yao
چکیده

The QR decomposition, recursive least squares (QRD RLS) algorithm is one of the most promising RLS algorithms, due to its robust numerical stability and suitability for VLSI implementation based on a systolic array architecture. Up to now, among many techniques to implement the QR decomposition, only the Givens rotation and modified GramSchmidt methods have been successfully applied to the development of the QRD RLS systolic array. It is well known that Householder transformation (HT) outperforms the Givens rotation method under finite precision computations. Presently, there is no known technique to implement the HT on a systolic array architecture. In this paper, we propose a systolic block Householder transformation (SBHT) approach, to implement the HT on a systolic array as well as its application to the RLS algorithm. Since the data is fetched in a block manner, vector operations are in general required for the vectorized array. However, a modified HT algorithm permits a two-level pipelined implementation of the SBHT systolic array at both the vector and word levels. The throughput rate can be as fast as that of the Givens rotation method. Our approach makes the HT amenable for VLSI implementation as well as applicable to real-time high throughput applications of modern signal processing. The constrained RLS problem using the SBHT RLS systolic array is also considered in this paper.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Pipelined Cordic Based Qrd-rls Adaptive Filtering Using Matrix Lookahead

In this paper, the matrix lookahead transformation is developed to achieve ne-grain pipelining for Cordic based QRD-RLS adaptive ltering algorithm. Various implementation styles are proposed. They include pipelining, block processing, and incremental block processing. The proposed architectures can operate at arbitrarily high sample rates, and consist of only Givens and a few Gaussian rotations...

متن کامل

Parallel implementation of a class of algorithms linking NLMS and block RLS

In this paper, first a brief review is given of a fully pipelined algorithm for recursive least squares (RLS) estimation, based on socalled ‘inverse updating’. Then a specific class of (block) RLS algorithms is considered, which embraces normalized LMS as a special case (with block size equal to one). It is shown that such algorithms may be cast in the ‘inverse-updating RLS’ framework. This all...

متن کامل

Pipelined RLS adaptive architecture using relaxed Givens rotations (RGR)

In this paper, we focus on developing a new relaxed Givens rotations (RGR)-RLS algorithm and the corresponding RGR-RLS systolic array. The resulting algorithm and architecture possess fine-grain pipelining, nearly the same convergence as the QRD-RLS, good robustness for , and square-root free computation with a little area overhead.

متن کامل

Recursive least-squares using a hybrid Householder algorithm on massively parallel SIMD systems

Within the context of recursive least-squares, the implementation of a Householder algorithm for block updating the QR decomposition, on massively parallel SIMD systems, is considered. Initially, two implementations based on di€erent mapping strategies for distributing the data matrices over the processing elements of the parallel computer are investigated. Timing models show that neither of th...

متن کامل

A systolic array for recursive least squares computations: mapping directionally weighted RLS on an SVD updating array

The original recursive least squares (RLS) computational scheme basically consists of triangular updates and triangular backsolves, and it is well known that pipelining these two separate steps on a parallel architecture as such is impossible. As a result of this, a straightforward systolic implementation with, e.g. O(n 2 ) processors for O(n 2 ) operations per update, will have a throughput wh...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • IEEE Trans. Signal Processing

دوره 40  شماره 

صفحات  -

تاریخ انتشار 1992